Cocojunk

🚀 Dive deep with CocoJunk – your destination for detailed, well-researched articles across science, technology, culture, and more. Explore knowledge that matters, explained in plain English.

Navigation: Home

Cryptographically secure pseudorandom number generator

Published: Sat May 03 2025 19:23:38 GMT+0000 (Coordinated Universal Time) Last Updated: 5/3/2025, 7:23:38 PM

Read the original article here.


The Forbidden Code: Understanding Cryptographically Secure Pseudorandom Number Generators (CSPRNGs) - Beyond Simple 'Random'

In the world of secure systems, relying on simple, predictable "random" numbers is a critical vulnerability they often won't teach you about in standard programming classes. This is where the concept of a Cryptographically Secure Pseudorandom Number Generator (CSPRNG) comes in. Unlike the common pseudorandom number generators (PRNGs) used for simulations or games, CSPRNGs are built with the stringent demands of cryptography in mind, operating in the shadows where true unpredictability and resistance to attack are paramount.

This resource dives deep into what makes a CSPRNG 'secure', their underlying principles, common designs, and the critical failures that have exposed their vulnerabilities – the kind of knowledge essential for anyone venturing into the secure corners of programming.

What is a CSPRNG?

Let's start with a fundamental definition:

A cryptographically secure pseudorandom number generator (CSPRNG), also known as a cryptographic pseudorandom number generator (CPRNG) or cryptographic random number generator (CRNG), is a pseudorandom number generator (PRNG) possessing specific properties that render it suitable for use in cryptographic applications.

Essentially, a CSPRNG is a PRNG that meets tough security standards. While a regular PRNG aims to produce sequences that appear random based on statistical tests, a CSPRNG must produce sequences that are computationally indistinguishable from true randomness, even when faced with an adversary trying to predict its output.

Why are CSPRNGs Essential for Security?

Many cryptographic operations rely on numbers that are unpredictable to an attacker. Using a standard, non-cryptographic PRNG for these purposes is a common and potentially catastrophic mistake. Here's where CSPRNGs are critical:

  • Key Generation: Generating secret keys for encryption or signing. These keys must be highly unpredictable, as their compromise breaks the entire security scheme.
  • Initialization Vectors (IVs): Non-secret but unique values used in block cipher modes (like CBC or CTR) to ensure that encrypting the same plaintext multiple times results in different ciphertext. While not needing the same level of entropy as a key, they need to be unpredictable in advance to prevent certain attacks.
  • Nonces: Numbers used only once in a cryptographic communication. They are often used to prevent replay attacks. Uniqueness is the primary requirement, but unpredictability is also often necessary depending on the protocol.
  • Salts: Random data added to input (like a password) before hashing. Salts protect against pre-computation attacks (like rainbow tables). The salt needs to be unique and unpredictable to be effective. In some signature schemes (like ECDSA and RSASSA-PSS), random values used during signing also function like salts and require CSPRNG quality.
  • Token Generation: Creating unpredictable session tokens, password reset tokens, or other short-lived secrets that authenticate a user or action. Predictable tokens can lead to session hijacking or unauthorized access.

The "quality" of randomness needed varies. A nonce primarily requires uniqueness, but generating a strong master key demands much higher quality randomness derived from a source with high entropy. While true random number generators (TRNGs), based on physical processes (like atmospheric noise or hardware noise), offer high entropy, they are often slow and limited in output. CSPRNGs allow stretching a small amount of high-quality initial randomness (the "seed" or "state") into a much larger stream of pseudorandom bits while maintaining cryptographic security properties.

Entropy: In the context of random number generation, entropy refers to the measure of unpredictability or randomness present in a source. A high-entropy source produces outputs that are difficult to predict. Entropy is often measured in bits.

CSPRNGs can provide an abundance of pseudorandom numbers quickly once initialized with sufficient entropy, overcoming the speed limitations of true random sources. However, the security of the CSPRNG's output is fundamentally limited by the quality of the initial entropy used to seed it. If the initial seed is predictable, the entire sequence generated from it will also be predictable.

The High Bar: CSPRNG Requirements

Merely passing basic statistical tests is insufficient for cryptographic security. CSPRNGs must satisfy two critical groups of requirements that go far beyond those of ordinary PRNGs:

  1. Statistical Randomness (Computational Indistinguishability):

    • A CSPRNG must pass the next-bit test. This means that given any sequence of the first k bits generated by the CSPRNG, there is no computationally efficient (polynomial-time) algorithm that can predict the (k+1)th bit with a probability of success significantly better than 50% (i.e., better than a coin flip).
    • Context: Andrew Yao proved in 1982 that if a generator passes the next-bit test, it will pass all other polynomial-time statistical tests for randomness. This theoretical link is a cornerstone of CSPRNG design. It means that from a practical perspective, any attacker with limited time and computing power cannot statistically distinguish the output of a good CSPRNG from truly random noise.
  2. Resistance to Attack (State Compromise Security): This is perhaps the most critical requirement distinguishing CSPRNGs in the "Forbidden Code" context, as it directly addresses scenarios involving partial system compromise.

    • CSPRNGs must withstand state compromise extension attacks. This means:
      • Backward Secrecy (or History Secrecy): If an attacker gains knowledge of the CSPRNG's internal state at a certain point in time, they must find it computationally infeasible to reconstruct or determine any part of the sequence of random numbers generated prior to that point.
      • Forward Secrecy: If an attacker gains knowledge of the CSPRNG's internal state, it must be computationally infeasible for them to predict future output unless new, uncompromised entropy is mixed into the state. (Note: The definition provided in the original text primarily focuses on forward security where the state itself is updated and output, making the previous state irrecoverable).

Example of Failure: Consider a theoretical PRNG that simply outputs the binary digits of Pi, starting from a position determined by a secret seed (the starting position). Pi's digits might pass statistical tests (it's conjectured to be a normal number), potentially satisfying the next-bit test. However, if an attacker learns the internal state (the current position in Pi), they can easily calculate all previous digits and all future digits. This fails the state compromise resistance requirement completely, making it unsuitable as a CSPRNG.

Most standard PRNGs fail both of these critical requirements. They often produce sequences that, while statistically random under casual inspection, exhibit patterns discoverable with targeted cryptanalysis. Furthermore, their state is typically designed to be easily reversible, allowing full prediction of past and future outputs if the state is revealed. CSPRNGs are specifically engineered to prevent this kind of compromise.

The Theory Underneath: Formal Definitions

For cryptographers, the security of a CSPRNG is rooted in formal computational definitions.

In the asymptotic setting, a family of deterministic polynomial time computable functions $G_k : {0,1}^k \to {0,1}^{p(k)}$ for some polynomial $p$, is a pseudorandom number generator (PRNG) if it stretches the length of its input ($p(k) > k$ for any $k$), and if its output is computationally indistinguishable from true randomness...

This formal definition introduces concepts like "polynomial time computable" (meaning it runs efficiently) and "computationally indistinguishable."

Computationally Indistinguishable: Two probability distributions (in this case, the output of the CSPRNG seeded with a random value, and a truly random sequence of the same length) are computationally indistinguishable if no probabilistic polynomial-time algorithm (an efficient algorithm that can use randomness) can distinguish between samples from the two distributions with a success probability significantly better than random guessing. The "negligible function" $\mu(k)$ in the definition means the difference in probabilities becomes vanishingly small as the input size $k$ increases.

The definition highlights that a CSPRNG isn't truly random (it's deterministic given the seed), but its output looks like true randomness to any efficient attacker. The equivalence to the next-bit predictability test provides a more intuitive understanding for practitioners: if you can't predict the next bit, you can't efficiently distinguish the sequence from random.

The concept of Forward Security is also given a formal definition:

A forward-secure PRNG with block length $t(k)$ is a PRNG $G_k : {0,1}^k \to {0,1}^k \times {0,1}^{t(k)}$, where the input $s_i$ (state) is length $k$, and the output $(s_{i+1}, y_i)$ consists of the next state $s_{i+1}$ and the output block $y_i$. If the initial state $s_1$ is random, then for any $i$, the sequence $(y_1, y_2, \dots, y_i, s_{i+1})$ must be computationally indistinguishable from $(r_1, r_2, \dots, r_i, s_{i+1})$, where $r_i$ are random.

This formalizes the state compromise resistance. If you see the first $i$ output blocks ($y_1, \dots, y_i$) and the next state ($s_{i+1}$), this sequence should look like random outputs followed by an independent random value (the next state). Critically, learning $s_{i+1}$ doesn't help you learn anything about $y_1, \dots, y_i$ beyond what you already observed. A common way to achieve this, as described in the text, is by splitting the generator's output into two parts: the next state and the actual output block. The next state is computed from the current state but is designed so that the previous state cannot be derived from it.

Fueling the Generator: Entropy Sources and Extraction

CSPRNGs are deterministic algorithms. Their output is completely determined by their initial internal state, often called the "seed". If this seed is predictable, the entire output sequence is predictable, regardless of how good the algorithm is. Therefore, CSPRNGs require an initial seed derived from a source of high-quality, unpredictable randomness (entropy).

Typically, this entropy is gathered from hardware sources or environmental noise accessible through the operating system's randomness API (like /dev/random on Unix-like systems or CryptGenRandom on Windows). Sources can include:

  • Hard drive timings
  • Keyboard and mouse input timings
  • Interrupt timings
  • Network activity variations
  • Temperature sensor readings

However, collecting true entropy is often slow, and the amount of entropy available might be less than desired, especially on embedded systems or during boot-up before much environmental noise has occurred. Furthermore, there can be subtle correlations or biases in ostensibly independent processes that provide entropy.

This necessitates techniques for entropy extraction and pooling:

  • Pooling: Many practical CSPRNGs maintain an "entropy pool" where bits from various low-rate, noisy sources are mixed together to accumulate sufficient randomness before being used to seed or reseed the generator.
  • Extraction: Algorithms can be used to extract high-quality randomness from potentially biased or weakly random sources.
    • Von Neumann De-biasing: A classic, simple technique. It processes bits in pairs. For "01", output 0. For "10", output 1. Discard "00" and "11". This removes bias but discards about 75% of the input bits and requires bits to be independent within pairs.
    • Combining Weak Sources: As proven by Santha and Vazirani, multiple streams with weak, but uncorrelated, randomness can be combined (e.g., using an "extractor" function) to produce a stream with higher entropy.

Practical CSPRNG implementations often continuously mix new entropy sources into their internal state while running. This isn't strictly part of the pseudorandom generation function itself (which is deterministic from its state), but it's a crucial security mechanism. This continuous reseeding helps restore forward security even if the state is compromised, as the mixed-in entropy makes the subsequent state unpredictable to the attacker.

Building Blocks: CSPRNG Design Strategies

CSPRNGs are designed using various cryptographic and mathematical principles. They generally fall into two categories:

1. Designs Based on Cryptographic Primitives

These designs leverage well-studied cryptographic functions like block ciphers, stream ciphers, and hash functions, which are presumed to be secure.

  • Block Cipher Based: A common approach uses a secure block cipher (like AES) in a mode that turns it into a stream cipher. The Counter Mode (CTR DRBG) is a standard example.
    • How it works (Simplified CTR DRBG): You have a secret key (derived from the CSPRNG seed/state) and a counter. To generate random bits, you encrypt the counter using the key. The output of the encryption is the pseudorandom block. You increment the counter for the next block. The state update involves updating the key and/or counter based on the seed and potentially new entropy.
    • AES-CTR_DRBG: A widely used specific instance employing the Advanced Encryption Standard (AES).
    • NIST SP 800-90A Details: The NIST standard for DRBGs (Deterministic Random Bit Generators) specifies various constructs. CTR_DRBG is one. Interestingly, the standard initially specified erasing the key after each request for randomness to ensure forward secrecy. This was very inefficient. Later considerations led to an "extended interface" where the key might persist, raising concerns about potential misuse and key exposure time. Newer "fast-key-erasure" RNGs aim to balance performance and security by erasing the key as soon as randomness is consumed.
  • Stream Cipher Based: Some stream ciphers can directly function as CSPRNGs. The stream cipher's key is the CSPRNG state, and the generated keystream bits are the pseudorandom output. Examples include RC4 (though RC4 is now considered insecure for most uses due to biases), ISAAC, and ChaCha20.
  • Cryptographic Hash Function Based: Hash functions (like SHA-256) can also form the core of a CSPRNG (e.g., Hash DRBG in NIST SP 800-90A). This often involves hashing a secret state value, potentially concatenated with a counter or other data, and using the hash output as the random bits.
  • HMAC Based: HMAC (Hash-based Message Authentication Code) combines a cryptographic hash function and a secret key. It can also be used as the basis for a CSPRNG (HMAC DRBG in NIST SP 800-90A), often considered robust because of HMAC's strong security properties.

2. Designs Based on Hard Mathematical Problems

These designs base their security on the presumed difficulty of solving certain mathematical problems, similar to how public-key cryptography works.

  • Blum Blum Shub (BBS): This algorithm's security is theoretically linked to the difficulty of the quadratic residuosity problem, which is, in turn, related to the difficulty of integer factorization. If factoring large numbers is hard, BBS is secure.
    • Context: While theoretically sound (its security proof is strong conditional on factoring being hard), BBS is notorious for being extremely inefficient. It's rarely used in practice unless unprecedented levels of security are required, making it almost a theoretical curiosity for most developers.
  • Blum–Micali: An earlier example, its security is based on the difficulty of the discrete logarithm problem. Also considered theoretically sound but generally inefficient for practical use compared to primitive-based designs.
  • Dual EC DRBG: Based on elliptic curve mathematical problems (specifically, variations of the Diffie-Hellman problem and related point problems).
    • Context: This algorithm is infamous. While initially proposed with security proofs (based on assumptions like the Decisional Diffie-Hellman assumption), these proofs had caveats regarding the specific parameters and output length. As we'll see in the "Security Flaws" section, this design became a prime example of how even a theoretically based CSPRNG can be compromised in practice.

Running in the Wild: Practical CSPRNG Schemes

A practical CSPRNG implementation is more than just an algorithm; it includes mechanisms for gathering initial entropy ("seeding") and potentially mixing in more entropy over time to maintain robustness. Here are some notable examples:

  • /dev/random and /dev/urandom (Unix-like systems): These are interfaces provided by the operating system kernel.

    • /dev/random: Attempts to estimate the amount of entropy available and will block (wait) if it believes there isn't enough "true" randomness to fulfill a request. This can sometimes cause systems to stall.
    • /dev/urandom: Will not block. If the estimated entropy pool is empty, it will continue to generate pseudorandom numbers based on the existing state. While these numbers are still cryptographically secure given the state, they lack forward secrecy if the state is compromised and no new entropy has been mixed in since the compromise. For most cryptographic purposes, /dev/urandom is preferred as blocking can lead to denial-of-service issues, and the security properties are generally sufficient if the initial seeding was adequate.
    • Internal Designs: The actual algorithms powering these differ across operating systems:
      • Yarrow: An older design (used in macOS until recently) that attempted to estimate entropy. It used SHA-1 and 3DES internally.
      • Fortuna: The successor to Yarrow (now used in FreeBSD and modern Apple OSs). It explicitly does not attempt to estimate entropy, simplifying the design. It uses SHA-256 and a good block cipher. It relies on periodic reseeding from entropy sources.
      • Linux Kernel CSPRNG: Uses modern, high-performance primitives like ChaCha20 for generating random data and BLAKE2s for mixing entropy into the pool.
  • arc4random (Unix-like systems): A user-space CSPRNG API that traditionally seeded itself from /dev/random. While originally based on the RC4 stream cipher, most major implementations have replaced the core algorithm with ChaCha20 due to RC4's known weaknesses, while keeping the familiar API name.

  • CryptGenRandom (Windows): Part of Microsoft's CryptoAPI. It provides a CSPRNG facility on Windows systems. The underlying implementation has varied significantly between Windows versions.

  • ANSI X9.17 (Financial Institution Standard): An older standard (also adopted as a FIPS standard) that defined a specific block cipher-based CSPRNG.

    • How it works (Simplified): Uses a block cipher (originally TDEA, but generizable to AES) with a fixed key k. It uses the current date/time D_T and a state variable V. The random output R is generated as E_k(D_T XOR V) where E_k is encryption with key k. The next state V_next is calculated as E_k(R XOR D_T).
    • Weakness: The critical weakness cited (leading to the development of Yarrow) is that if the secret key k is ever leaked or compromised, the entire past and future stream of random numbers can be calculated, regardless of the state V or time D_T. This design lacks forward security dependent only on the state update.

It's crucial to note that many of these practical schemes (all except the strict ANSI X9.17 definition described) are not "pure" PRNGs in the theoretical sense, as they periodically or continuously incorporate new, external entropy into their state. This makes them DRBGs (Deterministic Random Bit Generators) that are seeded and potentially reseeded by external non-deterministic entropy sources (TRNGs). This hybrid approach is a robust defense against state compromise, as new entropy makes future outputs unpredictable even if the current deterministic state is known.

The Nightmare Scenario: Attacks and Security Flaws

Understanding how CSPRNGs fail is as important as knowing how they work, especially in the "Forbidden Code" domain where seemingly minor details can have catastrophic consequences. The history of CSPRNGs includes significant failures, sometimes deliberate.

The NSA Kleptographic Backdoor in Dual EC DRBG

Perhaps the most infamous CSPRNG failure, revealed by the Edward Snowden documents in 2013.

  • The Algorithm: Dual Elliptic Curve Deterministic Random Bit Generator (Dual EC DRBG) was one of the designs included in the NIST SP 800-90A standard. Its security relied on elliptic curve cryptography and specific "points" (parameters P and Q) defined within the standard.
  • The Flaw: Cryptographers had long suspected a potential backdoor in Dual EC DRBG. Due to the specific mathematical properties used, if someone knew a secret value related to the publicly defined points P and Q (the relationship between P and Q being the discrete logarithm of Q with respect to P), they could predict the generator's output after observing just 32 bytes.
  • The Revelation: The Snowden leaks confirmed that the NSA had secretly influenced NIST to make Dual EC DRBG the default recommended CSPRNG in SP 800-90A, and that the specific parameters P and Q were chosen such that the NSA knew the hidden relationship, effectively giving them a kleptographic backdoor. They could easily predict the output of any implementation using the standard parameters, allowing them to decrypt vast amounts of traffic encrypted using keys or nonces generated by this CSPRNG.
  • The Impact: The revelation was a major scandal. Companies like RSA Security were shown to have continued using Dual EC DRBG despite the public warnings, allegedly having received a $10 million payment from the NSA. This demonstrated how governments can actively undermine cryptographic standards to facilitate surveillance, turning a supposedly secure tool into a vulnerability.

The DUHK Attack

A more recent example (2017) highlighting implementation flaws.

  • The Attack: Don't Use Hard-coded Keys (DUHK). This attack targeted implementations using the older ANSI X9.31 standard, specifically the block cipher based CSPRNG mentioned earlier.
  • The Vulnerability: Instead of generating or securely managing the secret key k required by the ANSI X9.31 algorithm, some hardware vendors hardcoded the same key k into many devices.
  • The Result: If an attacker knew (or could discover) this hardcoded key (which is static across all vulnerable devices), they could then exploit the weakness of the ANSI X9.31 design. By observing encrypted data and knowing the hardcoded key, they could work backward to brute-force the CSPRNG state and ultimately deduce master encryption keys used for things like WPA2 Wi-Fi or VPN connections. This was a failure not just of the older X9.17/X9.31 design's forward security compared to modern ones, but a critical implementation error by vendors.

Historical Note: The Japanese PURPLE Cipher Machine

Even historical systems demonstrate the fundamental need for unpredictable randomness. During WWII, the Japanese PURPLE cipher machine's security was severely weakened because the values used for its keys were not generated with sufficient randomness. This allowed the United States to cryptanalyze the system and read diplomatic communications, a critical intelligence advantage ("MAGIC"). This historical example underscores that the need for high-quality random numbers in cryptography is not a new problem.

Standardization: The Quest for Trust

Given the critical role of CSPRNGs and the risks associated with insecure or backdoored designs, various standards bodies have published guidelines and algorithms.

  • NIST (National Institute of Standards and Technology):

    • FIPS 186-4 (Digital Signature Standard) includes requirements for random number generation for signature schemes like DSA and ECDSA.
    • NIST SP 800-90A Rev.1 (Deterministic Random Bit Generators) specifies approved DRBG constructs (Hash_DRBG, HMAC_DRBG, CTR_DRBG) and provides guidance on their implementation and seeding. This is the standard where Dual EC DRBG was controversially included and later removed.
    • NIST also provides standards for statistically testing random number generators (NIST Special Publication 800-22), which helps evaluate whether an output appears random, though passing these tests alone is not sufficient proof of cryptographic security.
  • ANSI (American National Standards Institute):

    • ANSI X9.17, X9.31, X9.62 defined older block cipher and number-theoretic based generators. As seen with X9.31's use in the DUHK attack and X9.17's forward secrecy weakness, some of these older standards have known issues compared to modern designs.

Relying on standardized and well-vetted CSPRNG designs is generally recommended, but the Dual EC DRBG incident serves as a stark reminder that even standards can be compromised and require careful scrutiny and vigilance.

Conclusion

CSPRNGs are a vital, often hidden, component of secure software. They are not simple utilities but complex cryptographic constructs that must resist sophisticated attacks. Understanding the difference between a standard PRNG and a CSPRNG, the critical requirements of state compromise resistance and computational indistinguishability, the importance of high-quality entropy, and the potential for design flaws or deliberate backdoors (like Dual EC DRBG) is essential for anyone building secure systems. The "Forbidden Code" isn't just about clever tricks; it's about recognizing the foundational elements of security that demand deep understanding and careful implementation, because the consequences of getting randomness wrong are severe.

See Also